Validating Explorative Clustering on a Hold-Out Test Set
نویسنده
چکیده
Comparing clustering algorithms is much more difficult than comparing classification algorithms, which is due to the unsupervised nature of the task and the lack of a precisely stated objective. We consider explorative cluster analysis as a predictive task (predict regions where data lumps together) and propose a measure to evaluate the performance on an hold-out test set. The performance is discussed for typical situations and results on artificial and real world datasets are presented for partitional, hierarchical, and density-based clustering algorithms. The proposed S-measure successfully senses the individual strengths and weaknesses of each algorithm.
منابع مشابه
Testing Several Rival Models Using the Extension of Vuong\'s Test and Quasi Clustering
The two main goals in model selection are firstly introducing an approach to test homogeneity of several rival models and secondly selecting a set of reasonable models or estimating the best rival model to the true one. In this paper we extend Vuong's method for several models to cluster them. Based on the working paper of Katayama $(2008)$, we propose an approach to test whether rival models h...
متن کاملDispersion Parameters and Effect of Impeller Speed, Holdup and Volume Fraction of Dispersed Phase on Separation Efficiency, Mass Transfer Coefficient of Dispersed Phase and Distribution Coefficient on Mixer-Settler Set
An experimental study has been conducted on the hydrodynamics of a stage mixer-settler to obtain an appropriate design. In this paper several tests was performed to investigate full factorial design of experiments. Since each test was repeated seven times, the repeatability of the test was confirmed (P=1 bar and T=25 °C). Sauter diameter was determined by photographing both the mix...
متن کاملExplorative data analysis techniques and unsupervised clustering methods to support clinical assessment of Chronic Obstructive Pulmonary Disease (COPD) phenotypes
Chronic Obstructive Pulmonary Disease (COPD) is the fourth leading cause of death worldwide and represents one of the major causes of chronic morbidity. Cigarette smoking is the most important risk factor for COPD. In these patients, the airflow limitation is caused by a mixture of small airways disease and parenchyma destruction, the relative contribution of which varies from person to person....
متن کاملConstructing and Validating a Q-Matrix for Cognitive Diagnostic Analysis of a Reading Comprehension Test Battery
Of paramount importance in the study of cognitive diagnostic assessment (CDA) is the absence of tests developed for small-scale diagnostic purposes. Currently, much of the research carried out has been mainly on large-scale tests, e.g., TOEFL, MELAB, IELTS, etc. Even so, formative language assessment with a focus on informing instruction and engaging in identification of student’s strengths and...
متن کاملA Clustering Approach by SSPCO Optimization Algorithm Based on Chaotic Initial Population
Assigning a set of objects to groups such that objects in one group or cluster are more similar to each other than the other clusters’ objects is the main task of clustering analysis. SSPCO optimization algorithm is anew optimization algorithm that is inspired by the behavior of a type of bird called see-see partridge. One of the things that smart algorithms are applied to solve is the problem ...
متن کامل